Elissa: A Dialectal to Standard Arabic Machine Translation System

نویسندگان

  • Wael Salloum
  • Nizar Habash
چکیده

Modern Standard Arabic (MSA) has a wealth of natural language processing (NLP) tools and resources. In comparison, resources for dialectal Arabic (DA), the unstandardized spoken varieties of Arabic, are still lacking. We present Elissa , a machine translation (MT) system from DA to MSA. Elissa (version 1.0) employs a rule-based approach that relies on morphological analysis, morphological transfer rules and dictionaries in addition to language models to produce MSA paraphrases of dialectal sentences. Elissa can be employed as a general preprocessor for dialectal Arabic when using MSA NLP tools. ú j’ ®Ë @ é J K. QaË@ ú Í@ é J K. QaË@ H A J ÓA aË @ áÓ é J Ë B@ é Ôg. Q Ê Ë ú G. ñƒA g ÐA ¢  : A ‚ Ë @ éÊ KA Ü Ø XP@ ñÓð H@ ðX @ Q ̄ñ J K B A Ò J K. A J K. ñƒA g ú j’ ®Ë @ é J K. QaË@ é a ÊË @ é m.Ì'A aÖÏ èQ J» XP@ ñ Óð H@ ð X @ Yg. ñ K @ Yë A J Jm '. ú ̄ ÐY ® Jƒ . é J K. QaË@ é a ÊË @ áÓ é J ƒA J ®Ë @ Q « é J ojÖÏ @ q‚ Ë @ ù ëð , é J K. QaË@ H A J ÓA aË @ ém.Ì'A a ÜÏ A ‚ Ë @ YÒ Ja K . ú j’ ®Ë @ é J K. QaË@ ú Í@ é J K. QaË@ H A J ÓA aË @ áÓ é J Ë B@ é Ôg. Q Ë AK. Ðñ ®K ú G. ñƒA g ÐA ¢  ù ëð , A ‚ Ë @ Ñk. A aÓð é Ôg. Q Ë @ Y «@ ñ ̄ áÓ é «ñÒm.×ð é ÒÊ3⁄4Ê Ë ú ̄ Q ’Ë@ ÉJ Êj JË @ éJ ̄ ÐY j J ‚ , Y «@ ñ ®Ë @ ú Ϋ A J J.Ó C k ú j’ ®Ë @ é Ê Òm.Ì'@ P A J J kB é K ñ a Ë h. X A Ü ß ú Í@ é ̄ A “@ , é J ÓA aË @ H A ÒÊ3⁄4Ê Ë H A Ôg. Q Kð H A ̄ X @ QÓ Z A ‚ B é J ÓA « Ð @ Y j Jƒ@ ÉJ. ̄ é J K. QaË@ H A J ÓA aË @ ém.Ì'A a ÜÏ A ‚ Ë @ Ð@ Y j Jƒ@ áoÖß . é JoÒ ÜÏ @ É Ò m.Ì'@ © J Ôg. á K. é̄C £ É ’ ̄ B@ . A îD Ê« ú j’ ®Ë @ é J K. QaË@ é a ÊË è Y a Ó H @ ð X @

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dialectal Arabic to English Machine Translation: Pivoting through Modern Standard Arabic

Modern Standard Arabic (MSA) has a wealth of natural language processing (NLP) tools and resources. In comparison, resources for dialectal Arabic (DA), the unstandardized spoken varieties of Arabic, are still lacking. We present ELISSA, a machine translation (MT) system for DA to MSA. ELISSA employs a rule-based approach that relies on morphological analysis, transfer rules and dictionaries in ...

متن کامل

Multi-Lingual Phrase-Based Statistical Machine Translation for Arabic-English

In this paper, we implement a multilingual Statistical Machine Translation (SMT) system for Arabic-English Translation. Arabic Text can be categorized into standard and dialectal Arabic. These two forms of Arabic differ significantly. Different mono-lingual and multi-lingual hybrid SMT approaches are compared. Mono-lingual systems do always result in better translation accuracy in one Arabic fo...

متن کامل

Dialectal to Standard Arabic Paraphrasing to Improve Arabic-English Statistical Machine Translation

This paper is about improving the quality of Arabic-English statistical machine translation (SMT) on dialectal Arabic text using morphological knowledge. We present a light-weight rule-based approach to producing Modern Standard Arabic (MSA) paraphrases of dialectal Arabic out-of-vocabulary (OOV) words and low frequency words. Our approach extends an existing MSA analyzer with a small number of...

متن کامل

Arabic Dialect Handling in Hybrid Machine Translation

In this paper, we describe an extension to a hybrid machine translation system for handling dialect Arabic, using a decoding algorithm to normalize non-standard, spontaneous and dialectal Arabic into Modern Standard Arabic. We prove the feasibility of the approach by measuring and comparing machine translation results in terms of BLEU with and without the proposed approach. We show in our tests...

متن کامل

Exploiting Out-of-Domain Data Sources for Dialectal Arabic Statistical Machine Translation

Statistical machine translation for dialectal Arabic is characterized by a lack of data since data acquisition involves the transcription and translation of spoken language. In this study we develop techniques for extracting parallel data for one particular dialect of Arabic (Iraqi Arabic) from out-ofdomain corpora in different dialects of Arabic or in Modern Standard Arabic. We compare two dif...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012